{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Further Hypothesis Testing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select this cell and type Ctrl-Enter to execute the code below.\n",
    "\n",
    "library(tidyverse)\n",
    "\n",
    "set_plot_dimensions <- function(width_choice, height_choice) {\n",
    "    options(repr.plot.width=width_choice, repr.plot.height=height_choice)\n",
    "}\n",
    "\n",
    "cbPal <- c(\"#E69F00\", \"#56B4E9\", \"#009E73\", \"#F0E442\", \"#CC79A7\", \"#0072B2\", \"#D55E00\")\n",
    "\n",
    "set_plot_dimensions(5, 4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# You should see \"Attaching packages\" and some ticks by the packages loaded.\n",
    "# The \"Conflicts\" aren't a problem.\n",
    "\n",
    "# Other problems loading the library? Try running this cell.\n",
    "\n",
    "install.packages('tidyverse')\n",
    "\n",
    "library(tidyverse)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2 - Comparing means of two groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Run this cell to load the data.\n",
    "\n",
    "data <- read_csv(\"stars.csv\")\n",
    "\n",
    "type_key <- c('Brown Dwarf', 'Red Dwarf', 'White Dwarf', 'Main Sequence', 'Supergiant','Hypergiant')\n",
    "spectral_classes <- c('O','B','A','F','G','K','M')\n",
    "\n",
    "data$type <- factor(data$type)\n",
    "data$spectral_class <- factor(data$spectral_class, levels=spectral_classes)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For now, let's look at only types 4 and 5 (supergiant and hypergiant). These are of particular interest to your supervisor, Dr Howe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data %>%\n",
    "    filter(type %in% c(4,5)) %>%\n",
    "    ggplot(aes(x=log(luminosity), fill=type)) + \n",
    "        scale_fill_manual(values=cbPal[c(5,6)]) +\n",
    "        geom_histogram(alpha=0.5, bins=10) + \n",
    "        guides(fill=\"none\") +\n",
    "        facet_wrap(~ type, ncol=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dr Howe has noticed that supergiants and hypergiants seem to have very similar luminosity distributions. She asks you to check whether they have the same mean."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Question: do types 4 and 5 have the same mean luminosity?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The sample means of log(luminosity) are easy to obtain:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type4 <- \n",
    "    data %>% \n",
    "    filter(type == 4) %>% \n",
    "    pull(luminosity) %>% \n",
    "    log\n",
    "\n",
    "type5 <-\n",
    "    data %>% \n",
    "    filter(type == 5) %>% \n",
    "    pull(luminosity) %>% \n",
    "    log\n",
    "\n",
    "mean4 <- mean(type4)\n",
    "\n",
    "mean5 <- mean(type5)\n",
    "\n",
    "print(paste('Type 4:', mean4))\n",
    "print(paste('Type 5:', mean5))\n",
    "print(paste('difference:', mean4 - mean5))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "They are certainly very similar, but is the difference between them statistically significant?\n",
    "\n",
    "<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the histogram, both distributions of log(luminosity) seem approximately symmetrical and with a rough bell-curve, so for now we will assume that they are normally distributed. (We will look later at how to test for normality.)\n",
    "\n",
    "We can therefore choose a **parametric test** for the difference between two means. This means that the test uses a defined probability distribution (e.g. the normal distribution) as a model for the process that generates the data.\n",
    "\n",
    "<br>\n",
    "\n",
    "In general, if the assumptions of a parametric test are satisfied then it will provide more **statistical power** than a non-parametric alternative. Statistical power is defined as the probability that the test *correctly rejects the null hypothesis when it is false*, also known as its *sensitivity* or *true positive rate*.\n",
    "\n",
    "Different parametric test make different **assumptions** about the data, so it is important to think carefully about whether these are satisfied before deciding on a particular test.\n",
    "\n",
    "\n",
    "In this example, a [*t-test*](https://en.wikipedia.org/wiki/Student%27s_t-test) is appropriate:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### t-test for 2 independent groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Theory\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When comparing two samples (1 and 2), we will refer to their sizes as $n_1$ and $n_2$, their sample means as $\\bar{x}_1$ and $\\bar{x}_2$ and their sample standard deviations as $s_1$ and $s_2$.\n",
    "\n",
    "Recall that \n",
    "\n",
    "$$\\bar{x} = \\frac{\\sum_{i=1}^n x_i}{n}$$\n",
    "\n",
    "is the *sample mean*\n",
    "\n",
    "and \n",
    "\n",
    "$$s^2 = \\frac{\\sum_{i=1}^n (x_i - \\bar{x})^2}{n-1}$$\n",
    "\n",
    "is the *unbiased sample variance*.\n",
    "\n",
    "<br>\n",
    "\n",
    "For our example, we need a two-tailed test:\n",
    "\n",
    "$H_0$: The two samples come from the same distribution with mean $\\mu = \\mu_1 = \\mu_2$.\n",
    "\n",
    "$H_1$: The samples come from two different distributions, with means $\\mu_1 \\ne \\mu_2$.\n",
    "\n",
    "<br>\n",
    "\n",
    "The test statistic is given by\n",
    "\n",
    "$$t = \\frac{\\bar {x}_1 - \\bar{x}_2}{s_p \\cdot \\sqrt{\\frac{1}{n_1}+\\frac{1}{n_2}}}$$,\n",
    "\n",
    "where \n",
    "\n",
    "$$s_p^2 = \\frac{\\left(n_1-1\\right)s_1^2 + \\left(n_2-1\\right)s_2^2}{n_1 + n_2-2}$$ \n",
    "\n",
    "is an unbiased estimator of the *pooled variance* of the two samples.\n",
    "\n",
    "<br>\n",
    "\n",
    "Under $H_0$, the test statistic $t$ follows a *Student's t-distribution* with $n_1 + n_2 - 2$ degrees of freedom.\n",
    "\n",
    "We use this distribution to calculate a p-value for the observed value of the test statistic, $t$.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Assumptions\n",
    "\n",
    "- The means of the two samples follow normal distributions. This is true if the samples themselves are normal, but also true for any other distribution if $n$ is large (by the *central limit theorem*).\n",
    "- The two populations have equal variance.\n",
    "- The two samples are independent.\n",
    "\n",
    "Two-sample t-tests are robust to moderate deviations from these assumptions, but major deviations may produce misleading results."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Application\n",
    "\n",
    "$H_0$: $\\mu_{\\text{type4}} = \\mu_{\\text{type5}}$.\n",
    "\n",
    "$H_1$: $\\mu_{\\text{type4}} \\ne \\mu_{\\text{type5}}$.\n",
    "\n",
    "Let's set a significance level $\\alpha=0.05$\n",
    "\n",
    "In R, we just supply the data for each sample and the `t.test` function deals with the rest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t.test(type4, type5, var.equal=TRUE, paired=FALSE, alternative=\"two.sided\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can visualise this result on the t-distribution:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "t_obs <- t.test(type4, type5, var.equal=TRUE, paired=FALSE, alternative=\"two.sided\")$statistic\n",
    "df <- length(type4) + length(type5) - 2\n",
    "print(paste(\"degrees of freedom:\", df))\n",
    "\n",
    "tmin <- -4\n",
    "tmax <- 4\n",
    "x <- seq(tmin,tmax,0.01)\n",
    "plot(x, dt(x,df), xlab=\"t\", ylab=\"pdf\", type=\"l\", col=\"grey\")\n",
    "\n",
    "# the area of the shaded region is the two-tailed p-value\n",
    "lower_tail <- seq(tmin,-t_obs,0.01)\n",
    "upper_tail <- seq(t_obs,tmax,0.01)\n",
    "polygon(c(lower_tail,-t_obs,tmin), c(dt(lower_tail,df),0,0), border=NA, col=\"lightgrey\")\n",
    "polygon(c(upper_tail,tmax,t_obs), c(dt(upper_tail,df),0,0), border=NA, col=\"lightgrey\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The t-test p-value is greater than than $\\alpha$, so we accept the null hypothesis that the means are equal."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Other types of t-test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### One-tailed t-test\n",
    "\n",
    "In the example above, we used a *two-tailed test* (because $H_1:\\mu_1 \\ne \\mu_2$ was symmetrical). \n",
    "\n",
    "For a *one-tailed test*, we need to halve the two-sided p-value, e.g. \n",
    "\n",
    "$H_1:\\mu_{\\text{type4}}>\\mu_{\\text{type5}}$ would give us "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot(x, dt(x,df), xlab=\"t\", ylab=\"pdf\", type=\"l\", col=\"grey\")\n",
    "\n",
    "# the area of the shaded region is the one-tailed p-value for H1: mu_1 > mu_2\n",
    "upper_tail <- seq(t_obs,tmax,0.01)\n",
    "polygon(c(upper_tail,tmax,t_obs), c(dt(upper_tail,df),0,0), border=NA, col=\"lightgrey\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In R, we can get the upper tail p-value directly by specifying `alternative=\"greater\"` :"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p_upper_tail <- t.test(type4, type5, var.equal=TRUE, paired=FALSE, alternative=\"greater\")$p.value\n",
    "print(paste('1-tailed p-value:', p_upper_tail))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To test the complementary hypothesis (i.e. $H_1:\\mu_1<\\mu_2$), we would need to use `alternative=\"less\"`, which is equivalent to calculating `1 - p_upper_tail`, e.g. \n",
    "\n",
    "$H_1:\\mu_{\\text{type4}}<\\mu_{\\text{type5}}$ would give us"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot(x, dt(x,df), xlab=\"t\", ylab=\"pdf\", type=\"l\", col=\"grey\")\n",
    "\n",
    "# the area of the shaded region is the one-tailed p-value for H1: mu_1 < mu_2\n",
    "lower_tail <- seq(tmin,t_obs,0.01)\n",
    "polygon(c(lower_tail,t_obs,-tmin), c(dt(lower_tail,df),0,0), border=NA, col=\"lightgrey\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p_complementary <- t.test(type4, type5, var.equal=TRUE, paired=FALSE, alternative=\"less\")$p.value\n",
    "print(paste('complementary p-value:', p_complementary))\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "#### Paired two-sample t-test\n",
    "\n",
    "Sometimes we have two samples with paired observations (for example, luminosity of the same set of stars, as measured on two different dates). This situation requires testing whether the *mean of the differences* between pairs is zero, which is called a [*paired two-sample t-test*](https://en.wikipedia.org/wiki/Student%27s_t-test#Dependent_t-test_for_paired_samples).\n",
    "\n",
    "#### Welch's t-test\n",
    "\n",
    "If the sample sizes in the two groups being compared are equal, Student's original t-test is highly robust to the presence of unequal variances. [*Welch's t-test*](https://en.wikipedia.org/wiki/Welch%27s_t-test) is an alternative that is insensitive to equality of the variances, regardless of whether the sample sizes are similar.\n",
    "\n",
    "#### One-sample t-test\n",
    "\n",
    "For cases where we want to compare a sample against a theoretical mean, we can use the [*one-sample t-test*](https://en.wikipedia.org/wiki/Student%27s_t-test#One-sample_t-test)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Alternatives to the t-test\n",
    "\n",
    "#### Mann-Whitney U-test\n",
    "\n",
    "For non-normal samples where $n$ is small, the assumptions of the t-test break down. However, we can use a *non-parametric test* to compare two samples, whatever the shape of their distributions.\n",
    "\n",
    "The [*Mann-Whitney U-test*](https://en.wikipedia.org/wiki/Mann–Whitney_U_test) (aka Wilcoxon rank-sum test) is one such test. The null hypothesis for this test is that a randomly selected value from sample 1 is equally likely to be less than or greater than a randomly selected value from sample 2. If the distributions are sufficiently different, the resulting p-value will be small and we will reject this null hypothesis. Note that the U-test does not compare the sample means directly.\n",
    "\n",
    "\n",
    "#### Wilcoxon signed-rank test\n",
    "\n",
    "The [*Wilcoxon signed-rank test*](https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test) is is the paired-sample version of the Mann-Whitney U-test.\n",
    "\n",
    "<br>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
